Skip to content

Can't let one rpc fail fuck things up.#1

Merged
AamirAlam merged 7 commits intomainfrom
public-rpc-fallback
Apr 5, 2026
Merged

Can't let one rpc fail fuck things up.#1
AamirAlam merged 7 commits intomainfrom
public-rpc-fallback

Conversation

@br0wnD3v
Copy link
Copy Markdown
Collaborator

@br0wnD3v br0wnD3v commented Apr 2, 2026

Global RPC Fallback for MrOracle

3-layer JSON-RPC fallback (primary → backup → public) covering every RPC call MrOracle makes, plus correctness fixes found during audit.

Problem

MrOracle used a single RPC endpoint with no failover. When it goes down, oracle price updates silently stop. No prices pushed on-chain, which blocks all user position operations (close, SL, TP etc).

Solution

3-layer JSON-RPC fallback (primary → backup → public) covering ALL RPC operations:

  • Pool account fetch at startup
  • Priority fee polling (every 5s)
  • Oracle price update transaction signing + sending + confirmation

Per endpoint: get blockhash → sign → send → confirm (4 polls @ 500ms, up to 2s). On-chain errors short-circuit the chain (no retry on backup/public for deterministic failures).

Stateless — each RPC call starts fresh from primary. If primary recovers, the very next call hits it first. No circuit breaker, no stickiness.

CLI args

  • --rpc (renamed from --endpoint) — primary JSON-RPC endpoint
  • --rpc-backup — optional backup RPC endpoint
  • --rpc-public — last-resort public RPC (defaults to api.mainnet-beta.solana.com)

Bug fixes (audit findings)

  • --commitment actually plumbed through: the CLI arg was parsed but ignored, so --commitment processed gave users default (finalized or confirmed) behavior. Now flows through to RpcClient::new_with_timeout_and_commitment(). Affects blockhash freshness, tx confirmation polling, and account reads.
  • Priority fee tiering: fetching logic was returning mean of unfiltered fees (the Triton percentile param was a server-side extension that doesn't work on public RPCs). Computes percentiles client-side.
  • On-chain error short-circuit: sign_and_send no longer retries deterministic on-chain errors across endpoints, preventing priority fee waste.
  • MutexGuard held across await: the priority fee lock was held during the entire tx send/confirm (potentially 30s+ during fallback), blocking the refresher task. Extract the fee value before the await.
  • Dead code removal: the periodical_priority_fees_fetching_task.take().abort() no-op that always ran on freshly-initialized None.
  • get_signature_statuses timeout: wrapped in explicit tokio::time::timeout for predictable per-endpoint worst-case timing.

What's NOT touched

  • Core 5-second loop timing (but worst case stretches to ~30s during full fallback, documented in README)
  • Database price fetching + retry logic (3 attempts exponential backoff)
  • ChaosLabs batch format building
  • let _ = non-blocking error handling in the main loop (service never crashes on total RPC failure)
  • updatePoolAum instruction structure (raw Vec build produces byte-identical tx vs anchor RequestBuilder)

Usage

--rpc https://primary-rpc.example.com/
--rpc-backup https://backup-rpc.example.com/
--rpc-public https://api.mainnet-beta.solana.com

Without --rpc-backup: primary → public. With it: primary → backup → public.

@br0wnD3v br0wnD3v requested review from AamirAlam and ElementalBrian and removed request for ElementalBrian April 2, 2026 22:28
Copy link
Copy Markdown
Collaborator

@AamirAlam AamirAlam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@AamirAlam AamirAlam merged commit 6072424 into main Apr 5, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants